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Abstract 

In this paper we improve existing results in the field of compressed sensing and matrix 
completion when sampled data may be grossly corrupted. We introduce three new theorems. 
1) In compressed sensing, we show that if the m x n sensing matrix has independent Gaussian 
entries, then one can recover a sparse signal x exactly by tractable t\ minimization even if a 
positive fraction of the measurements are arbitrarily corrupted, provided the number of nonzero 
entries in x is 0(m/(log(n/m) + 1)). 2) In the very general sensing model introduced in [7J 
and assuming a positive fraction of corrupted measurements, exact recovery still holds if the 
signal now has 0(m/(log 2 n)) nonzero entries. 3) Finally, we prove that one can recover an 
n x n low-rank matrix from m corrupted sampled entries by tractable optimization provided 
the rank is on the order of 0(m/(n log 2 n))\ again, this holds when there is a positive fraction 
of corrupted samples. 

Keywords. Compressed Sensing, Matrix Completion, Robust PCA, Convex Optimization, 
Restricted Isometry Property, Golfing Scheme. 

1 Introduction 

1.1 Introduction on Compressed Sensing with Corruptions 

Compressed sensing (CS) has been well-studied in recent years [9fT9] . This novel theory asserts that 
a sparse or approximately sparse signal x E W 1 can be acquired by taking just a few non-adaptive 
linear measurements. This fact has numerous consequences which are being explored in a number 
of fields of applied science and engineering. In CS, the acquisition procedure is often represented 
as y = Ax, where A 6 ]j mxri i s called the sensing matrix and y € M m is the vector of measurements 
or observations. It is now well-established that the solution x to the optimization problem 



min ||x||i such that Ax = y, (1-1) 

X 

is guaranteed to be the original signal x with high probability, provided x is sufficiently sparse and A 
obeys certain conditions. A typical result is this: if A has iid Gaussian entries, then exact recovery 
occurs provided ||se||o < Cm/ (log (n/m) + 1) [1011181 137] for some positive numerical constant C > 0. 
Here is another example, if A is a matrix with rows randomly selected from the DFT matrix, the 
condition becomes ||x||o < Cm/ log n [9]. 



This paper discusses a natural generalization of CS, which we shall refer to as compressed sensing 
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with corruptions. We assume that some entries of the data vector y are totally corrupted but we 
have absolutely no idea which entries are unreliable. We still want to recover the original signal 
efficiently and accurately. Formally, we have the mathematical model 



y = Ax + f= [A,I] 



x 



(1.2) 



/ 



where x G M. n and / G M m . The number of nonzero coefficients in x is ||x||o and similarly for /. As 
in the above model, A is an m x n sensing matrix, usually sampled from a probability distribution. 
The problem of recovering x (and hence /) from y has been recently studied in the literature in 
connection with some interesting applications. We discuss a few of them. 

• Clipping. Signal clipping frequently appears because of nonlinearities in the acquisition device 
|27|I38|. Here, one typically measures g{Ax) rather than Ax, where g is always a nonlinear 
map. Letting / = g(Ax) — Ax, we thus observe y = Ax + /. Nonlinearities usually occur at 
large amplitudes so that for those components with small amplitudes, we have / = g(Ax) — 
Ax = 0. This means that / is sparse and, therefore, our model is appropriate. Just as before, 
locating the portion of the data vector that has been clipped may be difficult because of 
additional noise. 

• CS for networked data. In a sensor network, different sensors will collect measurements of 
the same signal x independently (they each measure = (ai,x)) and send the outcome 
to a center hub for analysis |23|l30j. By setting ctj as the row vectors of A, this is just 
z = Ax. However, typically some sensors will fail to send the measurements correctly, and 
will sometimes report totally meaningless measurements. Therefore, we collect y = Ax + /, 
where / models recording errors. 

There have been several theoretical papers investigating the exact recovery method for CS with 
corruptions [28 30, 38 , 40j , and all of them consider the following recovery procedure in the noiseless 
case: 



We will compare them with our results in Section 1.4. 

1.2 Introduction on matrix completion with corruptions 

Matrix completion (MC) bears some similarity with CS. Here, the goal is to recover a low-rank 
matrix L G R nxn from a small fraction of linear measurements. For simplicity, we suppose the 
matrix is square as above (the general case is similar). The standard model is that we observe 
Vq{L) where O C [n] x [n] := {1, n} x {1, n} and 



The problem is to recover the original matrix L, and there have been many papers studying this 
problem in recent years, see [8"l ll2ll2T|l26ll33] . for example. Here one minimizes the nuclear norm — 



min ||x||i + A(m, n)||/||i such that Ax + f = [A,I] ~ = y. 



(1.3) 
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the sum of all the singular values [20] — to recover the original low rank matrix. We discuss below 
an improved result due to Gross j2T] (with a slight difference). 

Define O ~ Ber(/)) for some < p < 1 by meaning that l{(ij)eO} are iid Bernoulli random 
variables with parameter p. Then the solution to 

min||L||* such that V (L) =V (L), (1.4) 

L 

is guaranteed to be exactly L with high probability, provided p > CpT ^°& Here, C p is a positive 
numerical constant, r is the rank of L, and \i is an incoherence parameter introduced in [8] which 
is only dependent of L. 

This paper is concerned with the situation in which some entries may have been corrupted. There- 
fore, our model is that we observe 

V (L) + S, (1.5) 

where O and L are the same as before and S G M nxn is supported on f2 C O. Just as in CS, this 
model has broad applicability. For example, Wu et al. used this model in photometric stereo |42j . 
This problem has also been introduced in |3j and is related to recent work in separating a low-rank 
from a sparse component [U[lU[l4l[24ll!3] ' ^ typical result is that the solution (L,S) to 

min||L||* + A(m,n)||5||i such that Vo(L) + S = Vo{V) + S, (1.6) 

L,S 

is guaranteed to be the true pair (L, S) with high probability under some assumptions about 
L, O, S [HUE]. We will compare them with our result in Section 1.4. 

1.3 Main results 

This section introduces three models and three corresponding recovery results. The proofs of these 
results are deferred to Section 2 for Theorem 1.1, Section 3 for Theorem 1.2 and Section 4 for 
Theorem 1.3. 

1.3.1 CS with iid matrices [Model 1] 

Theorem 1.1 Suppose that A is anmxn (m < n) random matrix whose entries are iid Gaussian 
variables with mean and variance 1/m, the signal to acquire is x G M n , and our observation is 
y = Ax + / + w where f,w G M. m and \\w\\2 < e. Then by choosing X(n,m) = , 1 , the 

y / log(n/m)+l 

solution (x, f) to 

min||x||i + A||/||i such that \\{Ax + /) - y)\\ 2 < e (1.7) 

«,/ 

satisfies \\x — x\\ 2 + ||/ — < probability at least 1 — C exp(— cm). T/iis ZioWs universally; 

that is to say, for all vectors x and f obeying \\x\\q < am / (log(n / m) + 1) and ||/||o < cum. i7ere a, 
C , c and K are numerical constants. 

In the above statement, the matrix A is random. Everything else is deterministic. The reader will 
notice that the number of nonzero entries is on the same order as that needed for recovery from 
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clean data [51IT0] ] 19| ] 37|. while the condition of / implies that one can tolerate a constant fraction of 
possibly adversarial errors. Moreover, our convex optimization is related to LASSO [35] and Basis 
Pursuit [T5l. 



1.3.2 CS with general sensing matrices [Model 2] 

In this model, m < n and 

A = 



m 



where a±, a m are n iid copies of a random vector a whose distribution obeys the following two 
properties: 1) Eaa* = /; 2) ||o||oo < ^JT^■■ This model has been introduced in [7] and includes a lot 
of the stochastic models used in the literature. Examples include partial DFT matrices, matrices 
with iid entries, certain random convolutions [34] and so on. 



In this model, we assume that x and / in (jl.2p have fixed support denoted by T and B, and 
with cardinality \T\ = s and \B\ = m b . In the remainder of the paper, xt is the restriction of x 
to indices in T and fs is the restriction of / to B. Our main assumption here concerns the sign 
sequences: the sign sequences of xt and fs are independent of each other, and each is a sequence 
of symmetric iid ±1 variables. 



Theorem 1.2 For the model above, the solution (x,f) to (11.31) . with X(n,m) = l/\/\ogn, is exact 

andm b < /JS. 

log" n [J* 



with probability at least 1 — Cn 3 , provided that s < a and mj, < /3&. Here C , a and f3 are 



some numerical constants. 

Above, x and / have fixed supports and random signs. However, by a recent de-randomization 
technique first introduced in @], exact recovery with random supports and fixed signs would also 
hold. We will explain this de-randomization technique in the proof of Theorem 1.3. In some specific 
models, such as independent rows from the DFT matrix, fi could be a numerical constant, which 
implies the proportion of corruptions is also a constant. An open problem is whether Theorem 1.2 
still holds in the case where x and / have both fixed supports and signs. Another open problem is 
to know whether the result would hold under more general conditions about A as in [6] in the case 
where x has both random support and random signs. 

We emphasize that the sparsity condition ||x||o < C ^ - is a little stronger than the optimal 
result available in the noise- free literature JSE]), namely, ||x||o < C — ^ — . The extra logarithmic 
factor appears to be important in the proof which we will explain in Section 3, and a third open 
problem is whether or not it is possible to remove this factor. 

Here we do not give a sensitivity analysis for the recovery procedure as in Model 1. Actually 
by applying a similar method introduced in [7] to our argument in Section 3, a very good error 
bound could be obtained in the noisy case. However, technically there is little novelty but it will 
make our paper very long. Therefore we decide to only discuss the noiseless case and focus on the 
sampling rate and corruption ratio. 
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1.3.3 MC from corrupted entries [Model 3] 

We assume L is of rank r and write its reduced SVD as L = UT.V* , where U,V G M nxr and 
£ £ M rXT \ Let be the smallest quantity such that for all 1 < i < n, 

\\ UU * ei f<^ ) \\ VV * ei \\l<VL, and H^Lo < 

n n n 

This model is the same as that originally introduced in [8], and later used in [4"lll2 1fl~6jl2H I32|. We 

observe Vo{L) + S, where O G [n] x [n] and S is supported on C O. Here we assume that O, 9, S 
satisfy the following model: 



Model 3.1: 

1. Fix an n by n matrix K, whose entries are either 1 or —1. 

2. Define O ~ Ber(p) for a constant p satisfying < p < ^- Specifically speaking, l{(ij)eO} are iid 
Bernoulli random variables with parameter p. 

3. Conditioning on G O, assume that G are independent events with P((z,j) G 
S7| (i, j) G O) = s. This implies that f2 ~ Ber(/?s). 

4. Define T := 0/9.. Then we have T ~ Ber(/)(1 - s)) 

5. Let 5 be supported on 9, and sgn(5) := Vn(K). 



Theorem 1.3 Under Model 3.1, suppose p > Cp ^ rl ° g - and s < C s . Moreover, suppose X := 
Vpwiog n an< ^ denote (L,S) as the optimal solution to the problem f) 1 . 6 j) . Then we have (L,S) = 
(L, S) with probability at least 1 — Cn -3 for some numerical constant C, provided the numerical 
constants C s is sufficiently small and C p is sufficiently large. 

In this model O is available while 9, T and S are not known explicitly from the observation 
Vo(L) + S. By the assumption O ~ Ber(p), we can use \0\/(n 2 ) to approximate p. From the 
following proof we can see that A is not required to be , \ exactly for the exact recovery. The 

° r ^ v pn log n ° J 

power of our result is that one can recover a low-rank matrix from a nearly minimal number of 
samples even when a constant proportion of these samples has been corrupted. 

We only discuss the noiseless case for this model. Actually by a method similar to [6], a subopti- 
mal estimation error bound can be obtained by a slight modification of our argument. However, 
it is of little interest technically and beyond the optimal result when n is large. There are other 
suboptimal results for matrix completion with noise, such as pQ, but the error bound is not tight 
when the additional noise is small. We want to focus on the noiseless case in this paper and leave 
the problem with noise for future work. 

The values of A are chosen for theoretical guarantee of exact recovery in Theorem 1.1, 1.2 and 
1.3. In practice, A is usually taken by cross validation. 



1.4 Comparison with existing results, relative works and our contribution 

In this section we will compare Theorems 11.11 [l~2l and [L3l with existing results in the literature. 
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We begin with Model 1. In [30], Wright and Ma discussed a model where the sensing ma- 
trix A has independent columns with common mean fi and normal perturbations with variance 
a 2 /m. They chose A(m, n) = 1, and proved that (x,f) = (x,f) with high probability provided 
\\x\\o < C\{a,n/m)m, \\f\\o < C^c, ra/m)m and / has random signs. Here C\(a, 1/m) is much 
smaller than C/ (log (n/m) + 1). We notice that since the authors of [30] talked about a different 
model, which is motivated by [41], it may not be comparable with ours directly. However, for our 
motivation of CS with corruptions, we assume A satisfy a symmetric distribution and get better 
sampling rate. 

A bit later, Laska et al. [28] and Li et al. [29] also studied this problem. By setting A(m, n) = 1, 
both papers establish that for Gaussian (or sub-Gaussian) sensing matrices A, if m > C(||a;||o + 
\\f\\o) log((n + m)/(||x||o + ||/||o))) then the recovery is exact. This follows from the fact that [A, I] 
obeys a restricted isometry property known to guarantee exact recovery of sparse vectors via l\ 
minimization. Furthermore, the sparsity requirement about x is the same as that found in the 
standard CS literature, namely, ||^||o < Cm/(log(n/m) + 1). However, the result does not allow a 
positive fraction of corruptions. For example, if m = \fn, we have ||/||o/m < 2/logn, which will 
go to zero as n goes to zero. 

As for Model 2, an interesting piece of work [30] (and later [31] on the noisy case) appeared during 
the preparation of this paper. These papers discuss models in which A is formed by selecting rows 
from an orthogonal matrix with low incoherence parameter fi, which is the minimum value such 
that n|^4jj| 2 < \x for any The main result states that selecting A = y/n/(Cfim logn) gives 
exact recovery under the following assumptions: 1) the rows of A are chosen from an orthogonal 
matrix uniformly at random; 2) x is a random signal with independent signs and equally likely 
to be either ±1; 3) the support of / is chosen uniformly at random. (By the de-randomization 
technique introduced in [4] and used in [30] , it would have been sufficient to assume that the signs 
of / are independent and take on the values ±1 with equal probability). Finally, the sparsity 
conditions require m > C/z 2 ||a;||o(log n) 2 and ||/||o < Cm, which are nearly optimal, for the best 
known sparsity condition when / = is m > Cfj,\\x\\ologn. In other words, the result is optimal 
up to an extra factor of \i log n; the sparsity condition about / is of course nearly optimal. 

However, the model for A does not include some models frequently discussed in the literature 
such as subsampled tight or continuous frames. Against this background, a recent paper of Candes 
and Plan [7J considers a very general framework, which includes a lot of common models in the 
literature. Theorem 1.2 in our paper is similar to Theorem 1 in [30J. It assumes similar sparsity 
conditions, but is based on this much broader and more applicable model introduced in [7J. Notice 
that, we require m > C/i||x||o(logn) 2 whereas [30] requires m > C / u 2 ||x||o(log n) 2 . Therefore, we 
improve the condition by a factor of \i, which is always at least 1 and can be as large as n. However, 
our result imposes ||/||o < Cm/ n, which is worse than ||/||o < by the same factor. In [30], the 
parameter A depends upon /i, while our A is only a function of m and n. This is why the results 
differ, and we prefer to use a value of A that does not depend on /i because in some applications, 
an accurate estimate of fj, may be difficult to obtain. In addition, we use different techniques of 
proof which the clever golfing scheme of [21] is exploited. 

Sparse approximation is another problem of underdetermined linear system where the dictionary 
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matrix A is always assumed to be deterministic. Readers interested in this problem (which always 
requires stronger sparsity conditions) may also want to study the recent paper [38] by Studer et al. 
There, the authors introduce a more general problem of the form y = Ax + Bf, and analyzed the 
performance of £i-recovery techniques by using ideas which have been popularized under the name 
of generalized uncertainty principles in the basis pursuit and sparse approximation literature. 

As for Model 3, Theorem 11.31 is a significant extension of the results presented in [3], in which 
the authors have a stringent requirement p = 0.1. In a very recent and independent work |16| . the 
authors consider a model where both O and f2 are unions of stochastic and deterministic subsets, 
while we only assume the stochastic model. We recommend interested readers to read the paper 
for the details. However, only considering their results on stochastic O and Q, a direct comparison 
shows that the number of samples we need is less than that in this reference. The difference is 
several logarithmic factors. Actually, the requirement of p in our paper is optimal even for clean 
data in the literature of MC. Finally, we want to emphasize that the random support assumption 
is essential in Theorem 1.3 when the rank is large. Examples can be found in |24j . 

We wish to close our introduction with a few words concerning the techniques of proof we shall 
use. The proof of Theorem 11.11 is based on the concept of restricted isometry, which is a standard 
technique in the literature of CS. However, our argument involves a generalization of the restricted 
isometry concept. The proofs of Theorems 11.21 and 1 1 . 3 1 are based on the golfing scheme, an elegant 
technique pioneered by David Gross [21], and later used in [4j[71[32] to construct dual certificates. 
Our proof leverages results from [3]. However, we contribute novel elements by finding an appro- 
priate way to phrase sufficient optimality conditions, which are amenable to the golfing scheme. 
Details are presented in the following sections. 



2 A Proof of Theorem 11.11 

In the proof of Theorem |1.1[ we will see the notation Ptx. Here x is a /c-dimensional vector, T 
is a subset of {1, ...,&} and we also use T to represent the subspace of all fc-dimensional vectors 
supported on T. Then Ptx is the projection of x onto the subspace T, which is to keep the value 
of x on the support T and to change other elements into zeros. In this section we use the notation 
"L-J" of "floor function" to represent the integer part of any real number. 



First we generalize the concept of the restricted isometry property (RIP) [11] for the convenience 
to prove our theorem: 

Definition 2.1 For any matrix $ G R i *( n + m ) J define the RIP-constant <5 Sl ,s 2 by the infimum value 
of 5 such that 

x 



(i- WIS + 11/115) < 



/ 



< i + 



holds for any x G W 1 with | supp(x)| < si and f G W 71 with | supp(/)| < S2- 
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Lemma 2.2 For any xi,X2 G W 1 and fi, 6K m suc/i i/iai supp(xi)nsupp(x2) = 0, |supp(xi)| + 
I supp(x 2 )| < s\ and supp(/i) Pi supp(/ 2 ) = (fr, | supp(/i)| + | supp(/ 2 )| < s 2 , we have 



xi 
h 



■t'2 
h 



< $ S1 ,S2\/ \\X1 III + ||/l|||\/ll X 2|l2 + II/2II2 



Proof First, we suppose ||xi||| + ||/i|| 2 = ll^lll + II/2II2 = !• By the definition of 5 sl!S2 , we have 



2(1 -5 S1>S2 ) <(* 



and 



2(1-^.3) < <& 



xi + x 2 
/1 + /2 



X\ - X2 
h-h 



Xl + x 2 

/1+/2 



XI — X2 
/l-/2 



<2(1 + <5 S1 , S2 ), 



< 2(1 + S S1:S2 ). 



By the above inequalities, we have 



4> 



X2 
h 



< 5 SljS2 , and hence by homogeneity, we 



have 



xi 
h 



x 2 

/2 



< £ Sl)S2 ^/ll^illi + Il/llll'\/ll a; 2|l2 + II/2II2 without the norm assumption. 



Lemma 2.3 Suppose <3> G R' x ( n + m ) with RIP-constant 52 Sl ,2s 2 < t§ (si, s 2 > 0)and A is between 
!\/fi ^\/*t M Then for any x G M n u>ii/i |supp(x)| < si, any / G R m u>ii/i |supp(/)| < S2, 



and any w G M m 1 1 -zx? 1 1 2 < e ^ e solution (x, /) to the optimization problem (jl.7)l satisfies 



x|| 2 + ||/-/||2<^T 



4^/13+13^!, 



-C. 



-9<52 si ,2s2 

Proof Suppose Ax = x — x and A/ = / — /. Then by (jl.7p we have 



4> 



Ax 
A/ 



< II -tf II 2 + 



<I> 



+ 10 



< 2e. 



It is easy to check that the original (x, /) satisfies the inequality constraint in (jl.7p . so we have 

llx + Axlli + AHZ + A/lli < HxIIi + AII/IIl (2.1) 



4^/13+13^, 2s 



■P. 



Then it suffices to show ||Ax||2 + ||A/||2 < i_g$ 2 — ~ 2 — 

Suppose To with |To| = s\ such that supp(x) G To. Denote Tq = T\ U • • • U T\ where |Ti| = 
... = |T;_i| = s\ and |7}| < s\. Moreover, suppose T\ contains the indices of the s\ largest (in the 
sense of absolute value) coefficients of Pt^Ax, T2 contains the indices of the s\ largest coefficients 
of -P^uj^cAx, and so on. Similarly, define Vq such that supp(/) C Vq and |Vb| = S2 ; and divide 
Vq c = V\ U ... U Vfc in the same way. By this setup, we easily have 



and 



^||Pt,Ax|| 2 < Si 2 ||Pt=Ax||i, 
i>2 

^HP^A/lb^s^ll^A/IlL 
i>2 



(2.2) 



(2.3) 
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On the other hand, by the assumption supp(x) C To and supp(/) C Vq, we have, 

||x + Ax||i = \\P To x + P To Ax||i + ||i^Ax||i ^ H x lli " W p To^h + II^Axlli, 
and similarly, 



||/ + A/||! > H/ld - ||iV A/IU + HiVf A/||i. 
By inequalities ([2TT]) . ([H4]) and we have 

||P T c Ax||! + AHF^A/ll! < ||P To Ax||! + A||iV A/||i. 



(2.4) 
(2.5) 

(2.6) 



By the definition of <$2ai,2s a > * ne ^ ac ^ 



Ax' 
A/ 



< 2e and Lemma 12.21 we have 



(1 - S 2S1 ,2 S2 ) {\\Pt Ax + PtAA\1 + II JV A/ + P^A/II^ 



< 



P To Ax + P Tl Ax 
P Vo Af + P Vl Af_ 

~P To Ax + P Tl Ax 
Pv.Af + PvAf. 
P To Ax + P Tl Ax 
Pv Af + P Vl Af : 



Ax 
A/ 



<I> 



P Ta Ax + ... + P T; Ax 

Py 2 A/ + ...+P yfc A/ 

Pt 2 Ax + ... + P Ti Ax' 
iV 2 A/ + ...+iV fc A/ 



+ 2e 



P To Ax + P Tl Ax 
P Vo Af + P Vl Af, 



<S 2si 



2*2 



Pt Ax 
,iV A/ 



+ 



P Tl Ax 
P^A/ 



) (Ell^ A3; ll2 + Ell^ A /ll2| 

!/ \i>2 J>2 / 



+ 2e v / l + 5 2si , 2s2 ^||P To Ax||i + ||P Tl Ax||l + ||P%A/||1 + \\P Vl Af\\l 
Moreover, since 

^||Pr.Ax|| 2 + ^||Pv.A/|| 2 

i>2 j>2 

< s^\\P TS Ax\\ 1 + s^WPvcAfh 
<2 s 7(||PrcAx|| 1 +A||Pv cA/|| 1 ) 

<2 S ^(||P T() Ax||i + A||Py () A/|| 1 ) 
<2sf 2 (sf\\P To Ax\\ 2 + Xsl\\P Vo Af\\ 2 ) 
<4||P To Ax|| 2 + 4||P yo A/|| 2 , 



By m and fl23 



By UM 

By Cauchy-Schwartz inequality 
By A < 2 



we have 



Pr Ax 

Pvo&f 



+ 



P Tl Ax 
PvAf 



^||P Tj Ax|| 2 + Ell^A/|| 2 

,i>2 i>2 



< 8(||P To Ax||i + ||P Tl Ax||i + ||Py A/||l + \\P Vl Af\\l) 
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Therefore, by 52 Sl ,2s 2 < V^j we have 



Since 



we have 



\Pt Ax\\ 2 2 + \\PtAA\2 + II^VoA/||| + ||iViA/||l < — 



9* 



2si,2s 2 



^ ||P T ,Ax|| 2 + \WvAfh < 4||Pt Ax|| 2 + 4||JV A/| 

J>2 j>2 



\Ax\\ 2 + \\Af\\ 2 < 5(||P To Ax|| 2 + ||iV A/|| 2 ) + (Hi^Axlla + A/|| 2 ) 



< V52,/|| j Pt Ax||| + II^AxlH + ||Py A/||l + \\P Vl Af\\ 2 2 



< yi3 + 13^,282 

1 — 9<5 2sij2s2 



We now cite a well-known result in the literature of CS, e.g. Theorem 5.2 of [3]. 

Lemma 2.4 Suppose A is a random matrix defined in model 1. Then for any < 5 < 1, there 
exist ci((5),c 2 (5) > such that with probability at least 1 — 2exp(— c 2 (<5)m) ; 



Ml! 



(1- (S)||z||! < ||Ac||! < (1 + 
ZioWs universally for any x with | supp(x)| < ci((5) 1 . 

Also, we cite a well-know result which can give a bound for the biggest singular value of random 
matrix, e.g. [T7j and [55] . 

Lemma 2.5 Lei be an m x n matrix whose entries are independent standard normal random 
variables. Then for every t > 0, with probability at least 1 — 2exp(— 1 2 /2), one has ||P|| 22 < 
\/m + y/n + t. 



We now prove Theorem ll.il 
Proof Suppose a, 5 are two constants independent of m and n, and their values will be specified 
later. Set s± = a l og il and s 2 = [cmij. We want to bound the RIP-constant (5 2si)2s2 for the 

(n + m) x m matrix $ = [A, I] when a is sufficiently small. For any T with \T\ = 2s\ and V with 
\V\ = 2s 2 , and any x with supp(x) C T, any / with supp(/) C V, we have 



\Ax + fg = \\Ax\\l + H/Ill + 2<P^P T z 5 /). 



By Lemma 12.41 assuming a < c\(5), with probability at least 1 — 2exp(— c 2 ((5)m)) we have 



(1 - 5)\\xg < \\Axg < (1 + 5)\\x\ 



(2.7) 
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holds universally for any such T and x. 



Now we we fix T and V, and we want to bound H-fV^-Prlb^- By Lemma 12.51 we actually have 
||iVAP T || 2 ,2 < -={y/2si + V2^+ \f¥m) < (2v / 2a + <5) (21 



with probability at least 1—2 exp(— 5 2 m/2). Then with probability at least 1—2 exp(— ^p) ( 2 " ) ( 2 ™ 2 ) > 
inequality 12.81 holds universally for any V satisfying \V\ = 2s\ and T satisfying |V| = 2s2- 
By 2s\ < 2a , n n L+1 , we have 2silog(^-) < aim, where a\ only depends on a and ai — >• 

as a — > 0, and hence ( 2 ™ ) < (:fsl) 2si — ex P( a i m )- Similarly, because 2s 2 < 2am, we have 
2s 2 log(§^|) < 02m, where a 2 only depends on a and a2 — > as a — > 0, and hence ( 2 ™) < 

holds universally for any such T and V with prob- 



(fs|) 2S2 — e x P(«2"T-)- Therefore, inequality 
ability at least 1 — 2exp((5 2 /2 — a\ — ai )m). 



Combined with 12.71 we have 
(l-5)lklll + ll/ll|-(2v / 2^+5)||x| 



< 



[A, I] 



<(l + «)W + ||/||^ + (2v'2a+«)|H| 2 ||/|| 2 



holds universally for any such T, U, x and / which probability at least 1 — 2 exp(— C2(5)m)) — 
2exp((5 2 /2 — ai — a2)m). By choosing an appropriate 5 and letting a sufficiently small, we have 
^2si,2s 2 < 1/9 with probability at least 1 — Ce~ cm . 



Moreover, under the assumption that a ^ i og ( n y^) + i ^ ^ 1) we have s± 
S2 = [am] > and 1 



a 



log(n/m)+l 



> o, 



, ^ < 1 

2 V , 2 



< 2. j^. Then Theorem 1 1.1 1 as a direct corollary of Lemma 



3 A Proof of Theorem Q 

In this section we will encounter several absolute constants. Instead of denoting them by C%, C 2 , 
we just use C, i.e., the values of C change from line to line. Also, we will use the phrase "with 
high probability" to mean with probability at least 1 — Cn~ c , where C > is a numerical constant 
and c = 3, 4, or 5 depending on the context. 

Here we will use a lot of notations to represent sub-matrices and sub-vectors. Suppose A G M mxn , 
P C [m] := {1, m}, Q C [n] and i £ [n]. We denote by Ap^ the sub-matrix of A with row indices 
contained in P, by A 5 q the sub-matrix of A with column indices contained in Q, and by Ap t Q the 
sub-matrix of A with row indices contained in P and column indices contained in Q. Moreover, we 
denote by Ap^ the sub-matrix of A with row indices contained in P and column i, which is actually 
a column vector. 

The term "vector" means column vector in this section, and all row vectors are denoted by an 
adjoint of a vector, such as a* for a vector a. Suppose a is a vector and T a subset of indices. Then 
we denote by clt the restriction of a on T, i.e., a vector with all elements of a with indices in T. 
For any vector v, we use vu\ to denote the i-th element of v. 
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3.1 Supporting lemmas 

To prove Theorem 11.21 we need some supporting lemmas. Because our model of sensing matrix A 
is the same as in (7j, we will cite some lemmas from it directly. 

Lemma 3.1 (Lemma 2.1 of JWJ Suppose A is as defined in model 2. Let T C [n] be a fixed set of 
cardinality s. Then for 5 > 0, F(\\A* T A ::T - Ih,2 > S) < 2s exp ( — ^ • oil-is Tal ) ■ ^ n particular, 



Ims 2(1+5/3) 

||^4*t^:,t ~~ ^11 2,2 < 2 high probability provided s < 7 m1 ™ n , and \\A* t A 1: t — I\\2,2 < 
witt /iig/i probability provided s < 7 gg; ~ » where 7 is some absolute constant. 



2 Vlog n 



This Lemma was proved in [7] by matrix Bernstein's inequality, which is first introduced by [2]. 
A deep generalization is given in |25j . 



Lemma 3.2 (Lemma 2.4 of ^) Suppose A is as defined in model 2. Fix T C [n] with \T\ = s 
and v 6 W . Then \\A* tc A :: tv\\ O0 < ^TyjIMk w ^h high probability provided s < 7 - 1 ™ - , where 7 is 
some absolute constant. 

Lemma 3.3 (Lemma 2.5 of ^) Suppose A is as defined in model 2. Fix T C [n] with \T\ = s. 
Then maxjgyc H^^A^I^ < 1 with high probability provided s < 7 - ^ - , where 7 is some absolute 
constant. 

3.2 A proof of Theorem 1.2 

In this part we will give a complete proof of Theorem 1 1 . 21 with a powerful technique called "golfing- 
scheme" introduced by David Gross in [21 j . and later in [3] and [7|. Under the assumption of model 
2, we additionally assume s < a n and rut, < where a and j3 are numerical constants 
whose values will specified later. 



First we give two useful inequalities. By replacing A with y m !" mt ^B c ,T m Lemma 3.1 and Lemma 
3.2, we have 

m 



-A r>c rpA 



m — nib 



< 1/2 (3.1) 

2,2 



and 

m 

max " 



m — mi. 



-Age Q"^4_B c ,i 



< 1 (3.2) 

2 



with high probability provided s < 7^^. Since s < a ™ in and m b < (3^, both O and E21 
hold with high probability provided a and f3 are sufficiently small. We assume f|3. If) and (13. 2j) hold 
throughout this section. 

First we prove that the solution (x, f) of (jl.3p equals (x, f) if we can find an appropriate dual 
vector qs c satisfying the following requirement. This is actually an "inexact dual vector" of the 
optimization problem (j 1 . 3 f) . This idea was first given explicitly in [22] and [21], and related to [5]. 
We give a result similar to [7]. 
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Lemma 3.4 (Inexact Duality) Suppose there exists a vector q B c £ K m mb satisfying 
\\v T - sgn(x T )\\ 2 < A/4, Halloo < V 4 and \\q B c\\oc < A/4, 

where 

v = A* B c t .q B c + A* B .\sgn(f B ). 
Then the solution (x, /) of (jl.3p equals (x, f) provided (3 is sufficiently small and A < |. 

Proof Set h = x — x. By = we have 

/iT c = xt c - 

By /b c = 0, an d Ax + / = Ax + /, we have Ah = f — f and 

A B c,-.h = (/ - /) B c = -/ B c. 
Then we have the following inequality 

\\x\\i + X\\fh 

= (x T ,sgn(x T )) + \\xt4i + A((/s,sgn(/ B )) + ||/b=||i) 
> (x T ,sgn(x T )) + ||xHli + A((/s,sgn(/ B )) + ||/b c ||i) 

= (x T + /iT,sgn(x T )) + HMIi +A((/ B -A B ,h,sgn(f B )) + ||A B <v/i||i) By $3} 

= IMIi + A||/||i + ||/it=||i + A||ABc, : /t||i + (h T ,sga(x T )) - X(A B ,h, sgn(/ B )). 

Since ||x||i + A||/||i < ||x||i + A||/||i, we have 

IIT^IU + XWAsc.hh + (^r,sgn(x r )) - \(A B .h,aga(f B )) < 0. 

By (I3.4|) . we have 

{h T ,v T ) + {h T c 7 v T c) = (h,v) = (h,A B c.q B c + A* B .\sgnf B ) = {A B c >: h,q B c) + \{A B>: h, sg 
and then by (|3.3p . 

(/i T , sgn(x T )) - A(A B|: /i, sgn(/e)) = {h T , (sgn(x T ) - ut)) + {A B ^ h,q BC ) - {h T c : v T ^ 

>-\\\h T h-\\\\A B cM\i-\\\hT4i- 

Unite it with (|3.7p . we have 

-^IIMa + \M\A B «,h\h -h < 0. 
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By (I3.ip . we have 
at least \. Therefore, 



m ^* 
m—m\y B C ,T 



2:2 



< J I and the smallest singular value of m ™ m A* Bc T A B c T is 



\\hrh < 2 

< 2 

< 2 



??? 



m — mi, 
rn 



m — nib 
m 



A* BC rpAgc fhr 



-A*gc j'AB c 1 T c hT c 



s 2 E 

i£T c 



m — 

m 



m — nib 
< 211/tTclh + V6 



A B cxAB c ,i 



+ 

2 

+ V6 



m 



m — nib 



A B c T ABc,;h 



m 



m — nib 



-A B c h 



h{i} | + VQ 



in 



m — nib 



-A B c h 



m — nib 



Plugging this into (USD, we have(f - \\) \\hq«*\\i + (f - ^ J ^r h ) M\ A B*,M\i < °- We know 



A B c.h 



By the triangle inequality 
By 



m—m^ 

I ~~ ^¥ \J ^ when /5 is sufficiently small. Moreover, by the assumption A < |, we have 
hj>c = and A B c.h = 0. Since A B c .h = A B c^\it + ^4b c ,t c ^t c ) we have A B c^hT = 0. The 
inequality (13.ip implies that A B c qp is injective, so /it = and h = + /it c = 0, which implies 
(x,f) = (xj). m 

Now let's construct a vector q B c satisfying the requirement (|3.3p by choosing an appropriate A. 



Proof (of Theorem II .2p Set A = "^=- It suffices to construct a q B c satisfying (|3.3p . Denot- 
ing u = A* B c .q B c, we only need to construct a q B c satisfying 

\\u T + XA* B;r sgn(f B ) - sgn(z T )|| 2 < j, \\ut4oo < ||AA^.Sgn(/B)|| 00 < ^, Halloo < j- 

Now let's construct our q B c by the golfing scheme. First we have to write A B c . as a block 
matrix. We divide B c into I = [log 2 n + lj = + lj disjoint subsets: B c = G\ U ... U G« where 

= m-i. Then we have X^!=i m « = m ~~ m f> ano - 



.4 



We want to mention that the partition of B c is deterministic, not depending on ^4, so Aq 1 A^,: 
are independent. Noticing < (3— < /3m, by letting /3 sufficiently small, we can require 

777 777 7T7 

— <C, — <C, — < Clog n for k = 3,...,/ 

TTll 7712 TTlfc 



for some absolute constant C. Since s < a , m a , we have 

fi log n 1 



s<aC 



mi 



\i log 2 n ' 



s < aC 



m 2 



/i log 2 n ' 



s<aC 



jj, log n 



for k = 3, Z. 



(3.9) 
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Then by Lemma 13. 11 replacing A with A / -^Aq.^t, we have the following inequalities: 

1 



■/?? 



mj 



m 

171 n 



Gj ,T Gj ,T 



I 



I 



< 



2.2 



2.2 



2-y/logn 



for j = 1,2; 



1 



< - for 7 = 3, I: 
~ 2 



with high probability provided a is sufficiently small. 



(3.10) 
(3.11) 



Now let's give an explicit construction of g B c. Define 

Po = sgn(x T ) - XA* B T sgn(f B ) 



and 



mi 



I 1 ~ ~ A *G l .T A G l ,T ) Pi-1 =[I- ^T A G t ,T A G t ,T 



mi 



m 



1 ^Gi,T^Gi,T PO 

mi ' 



for i = 1, I, and construct 



qB? 



^ A G U TP0 



.^M,TPl-l, 



Then by u = A* BC .qs c -, we have 



u = A 



^ A GuTPo 



^ A G u TPl-\ 



i=l 



We now bound the £2 norm of pi. Actually, by (13,10p . A3. llf) and f)3. 13|) . we have 

1 



llpilb < 
IIP2II2 < 



IIP0II2, 



\\p 



yii2 



< 



2^/logn 

rr — Ibolb, 

41ogn 

-(o) -7 II-P0II2 for j = 3, 



log n 2 ' 

Now we will prove our constructed q B c satisfies the desired requirements: 



The proof of 



AAVsgn(/ B ; 



< 



By Hoeffding's inequality, for any i = 1, n, we have 



>t) < 2 exp 



(3.12) 



(3.13) 



(3.14) 



(3.15) 



(3.16) 
(3.17) 
(3.18) 



21- 



By choosing t = C\f\ogn\\A B ^\\ 2 (C is some absolute constant), with high probability, we have 
XA* Bi sgn(f B ) < \Cyfiogn\\A B ,i\\ 2 < C 'J~*^f < \[$ < |, provided /? is sufficiently small, and 



this implies 



XA* B : sgn(/ B ; 



< i. 
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The proof of 



u T + \A* B T sgn(f B ) - sgn(x T ) 



< 4 



By (|3.15p and (|3.13p . we have u T = ^ ^-A GuT A Gi ,TPi-i = ^(Pi-i-Pi) = Po-Pz- Then by (|3.12|) 



i=l 



i=l 



we have 
we have 



ii T + A^g T sgn(/ B ) - sgn(x T ) = ||ii T - p || 2 = \\Pih- Since Avl^ .sgn(/ B ) < 1/8, 

112 ' oo 

\A* B T sgxi(f b) < which implies 



llPolb = ||AA B>T sgn(/ B ) -sgn(x T )|| 2 < 



(3.19) 



Then by Handl= [log 2 n + Ij, we have \\ Pl \\ 2 < ^(J)'!^ < (i^) (J) (|) 



< 



- 4, provided a is sufficiently small. 



The proof of H^IL < 1/8 

I 

By (|3.15p . we have mt c = / — A G . j>cA Gi TPi-i- Recall that A Gl ■, A G , ■ are independent, 

fa ^ 7T1 - ' ' 



i=l 



so by the construction of Pi-i we know A Gi , and are independent. Replacing A with 



'^A Gu . in Lemma [3T2| and by the sparsity condition (|3.9p . we have 



i=l 



— A* GuT cA Gi ,TP 



rn 



< 



i=i 



— llpi-ilb with high probability, provided a is sufficiently small. By (|3.16p . (|3.17p . (|3.18p 

^-^ 20 J s 



1 21 11 1 

and Q3.19D, we have ||mt c ||oo < ^2 ^oTi"^ 1 " 2 - 2o7| 2 " Po " 2 < 8' 

i=i 



The proof of H^IL ^ 3 

For k = 1, .., /, we denote ^4c fe : = -7= 



at 



, and ^4 b ; 



it suffices to show that for any 1 < < / and 1 < j < m^, 



■ By ([TOl) , (13341 and (I3TT2D . 



Set 



— 4 t . 1 ,r^. 1 ,T J •••(/- — ^ 1)T ^ Gl ,T J (sgn(x T ) - AA B)T sgn(/ s )) 



A 

< -. 

~ 4 



w = i 1 ~ — A *Gi,T A Gi,T 



(3.20) 



Then it suffices to prove 



m 



-w 



* (sgn(x T ) - AA B T sgn(/ s )) 



< 



Since w and sgn(xT) are independent, by Hoeffding's inequality and conditioning on w, we have 



\w*sgn(x T )\ >t)< 2exp 



- 4 j^,||2 ) f° r an Y t > 0. Then with high probability we have 



w 



*sgn(x T )\ < Cyf log n\\w\\i 



(3.21) 



16 



for some absolute constant C. 



Setting z = sgn(/ B ), we have w*A* B T sgn(/ B ) = -^/ J [(ai)T w ] z {i}- Since w, A b ,t and z are 

i=l 

independent, conditioning on w we have 

E{[(5i)» {i} } = E{(ai)* T w}E{z(i)} = 0, 



{ai)* T w]z {i] \ < \\w\\ 2 \\(ai)T\\ 2 < v^IMb < 



am 

— — 1Mb, 



log n 



and 



By Bernstein's inequality, we have 



W*A BT SgTi{f B )\ > 



< 2exp -- 



t 2 /2 



m b \\w\\i + 



^\H\2t/3 



log 71 



By choosing some numerical constant C and t = Cyjvn log ?2 1 1 w 1 1 2 , we have 

|^*^ jT sgn(/ B )| < C\/logn||u;||2 
with high probability, provided a is sufficiently small. 

By (EdO and (f3T22l) . we have 
Irn 



(3.22) 



* (sgn(x T ) - AA^ T sgn(/ B )) 



< 



n\ \w\ 2, 



for some numerical constant C. 



When A; > 3, by ([3T20]) . (IBTTUjl and ||5Hj> . we have ||w|| 2 < {\) x^y/J^s < 



- < Clogn, by (|3.23p . we have 
provided a is sufficiently small. 



m 

111 k 



log 2 n ' 



(3.23) 



Recalling 



if — 10 



sgn(s r ) - A^ iT sgn(/ B )) < C (^) ^(logn)^ 3 / 2 < 



When A; < 2, by (EOOjl and lETTO!) . we have ||w|| 2 < y^Za < . Recalling ^ < C, by ([333]) . we 
^gw* (sgn(x r ) - AA^ T sgn(/ B )) < C ^{logn)" 1 / 2 < \ provided a is sufficiently 



have 
small. 



Here we would like to compare our golfing scheme with that in [7|. There are mainly two 
differences. One is that we have an extra term \A* B .sgn(/#) in the dual vector. To obtain the 
inequality || 

"^T c ||oo ^ l/4> we propose to bound ||itT c ||oo 

and ||AAg .sgn(/B)||cc respectively, and 
this will lead to the extra log factor compared with [7]. Moreover, by using the golfing scheme to 
construct the dual vector, we need to bound the term ||(7_b c ||oo) which is not necessary in [7J. This 
inevitably incurs the random signs assumptions of the signal. 
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4 A Proof of Theorem 11.31 



In this section, the capital letters X, Y etc represent matrices, and the symbols in script font 

1. Vti etc represent linear operators from a matrix space to a matrix space. Moreover, for any 
CIq C [n] x [n] we have Vn M is to keep the entries of M on the support CIq and to change other 
entries into zeros. For any n x n matrix A, denote by ||-A||fj II-^IIj Halloo and respectively the 
Frobenius norm, operator norm (the largest singular value), the biggest magnitude of all elements, 
and the nuclear norm(the sum of all singular values). 

Similarly to Section 3, instead of denoting them as C\, C2, ■ we just use C, whose values change 
from line to line. Also, we will use the phrase "with high probability" to mean with probability 
at least 1 — Cn~ c , where C > is a numerical constant and c = 3, 4, or 5 depending on the context. 

4.1 A model equivalent to Model 3.1 

Model 3.1 is natural and used in but we will use the following equivalent model for the conve- 
nience of proof: 

Model 3.2: 1. Fix an n by n matrix K, whose entries are either 1 or —1. 

2. Define two independent random subsets of [n] x [n]: T' ~ Ber((l — 2s) p) and CI' ~ Ber( 1 _^y 2 sp )- 
Moreover, let O := V U CI', which thus satisfies O ~ Ber(p). 

3. Define annxn random matrix W with independent entries Wij satisfying P(Wy = 1) = P(Wy = 
-1) = |. 

4. Define CI" C CI': CI" := {(i,j) : (i,j) £ CI 1 ' ,W tj = K tj }. 

5. Define CI := Cl"/T', and T := O/Cl. 

6. Let S satisfy sgn(S) := V n {K). 

Obviously, in both Model 3.1 and Model 3.2 the whole setting is deterministic if we fix (O, CI). 
Therefore, the probability of (L, S) = (L, S) is determined by the joint distribution of (O, CI). It is 
not difficult to prove that the joint distributions of (O, CI) in both models are the same. Indeed, in 
Model 3.1, we have that (l{(i,j)eo}> l{(i,j)en}) are na ^ random vectors with the probability distribu- 
tion P(l{(ij) e o} = 1) = Pi ^{(ijjen} = M^{(i,j)eO} = 1) = a and ¥(l {{iJ]en} = l|l {(iJ)e0} = 0) = 
0. In Model 3.2, we have 

( 1 {(ij)60}, l{(ij)eO}) = (max(l {(ijj)6r , } , l{(i,j) g n'}); l {{i,i)m'} l {W id =K i ^{{i,j)&'-})- 

This implies that 0-{(i,j)eO}i 1{(«j)gC}) are independent random vectors. Moreover, it is easy to 
calculate that P(l {(iJ)e0} = 1) = p, ¥(l {{iJ)en} = 1) = sp and F(l {{iJ)en} = 1, l{(ij) 6 o} = 0) = 0. 
Then we have 

WQ-{(i,j)en} = 1 l 1 {(i,j)eo} = 1) = = h 1 {(i,j)&o} = 1 )/ IP ( 1 {(i,j)eo} = 1) = s > 

and 

P(!{(M)en} = l\l{(j,j)eO} = 0) = P(l{(ij)en} = 1> M(i,j)eO} = 0)/ F ( 1 {(i,j)eO} = 0) = 0. 
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Notice that although 0-{(i,j)eO}^{(i,j)eU}) depends on K, its distribution does not. By the above 
we know that (O, Q) has the same distribution in both models. Therefore in the following we will 
use Model 3.2 instead. The advantage of using Model 3.2 is that we can utilize V, Q 1 , W, etc. as 
auxiliaries. 

In the next section we prove some supporting lemmas which are useful for the proof of the main 
theorem. 

4.2 Supporting lemmas 

Define T := {UX* + YV*,X,Y G M nxr } a subspace of M nxn . Then the orthogonal projectors V T 
and TV in ^ nXn satisfy V T X = UU* X + XVV* -UU* XVV* andV T ±X = (I — UU*)X(I — VV*) 
for any X G R nxn . This means HT^-l^II < ||X|| for any X. Recalling the incoherence conditions: 
for any i E {l,...,n}, \\UU* ei \\ 2 < f and \\VV*e i \\ 2 < f , we have HTM^e*)^ < ^ and 

\\v T (eie*)\\ F < mm- 

Lemma 4.1 (Theorem 4-1 of J2$) Suppose Qq ~ Ber(po). Then with high probability, \\Vt — 
PQ 1 Vr'Pn VT\\ < e, provided that po > Co e~ 2 Atr '° g n for some numerical constant Co > 0. 

The original idea of the proof of this theorem is due to |36] . 

Lemma 4.2 (Theorem 3.1 of ^) Suppose Z G RangeiVT) is a fixed matrix, £lo ~ Ber(po), and 
e < 1 is an arbitrary constant. Then with high probability \\(T — Po 1 T j t'Pq )Z\\ oo < eWZW^ provided 
that po > Co e -2 ^ — - for some numerical constant Co > 0. 

Lemma 4.3 (Theorem 6.3 of Suppose Z is a fixed matrix, and ~ Ber(po). Then with high 
probability, \\(poT — Vn )Z\\ < C'^np log n||Z||oo provided that po < p and p > Co^p- for some 
numerical constants Co > and C > 0. 

Notice that we only have po = p in Theorem 6.3 of [8]. By a very slight modification in the 
proof (specifically, the proof of Lemma 6.2) we can have po < p as stated above. 

4.3 A proof of Theorem 1.3 

By Lemma 3.1, we have we have II (\-2 S ) p 7~'t1~'t'1~'t — *Pt\\ < \ an d || 2 ) ^ >T ^ >r ' II — V^V^ w hh 
high probability provided C p is sufficiently large and C s is sufficiently small. We will assume both 
inequalities hold all through the paper. 

Theorem 4.4 // there exists an n x n matrix Y obeying 

' \\V T Y + V T {XPo/v'W - UV*)\\ F < 
^ \\V T ±Y + V T x(\V fl?W)\\< \, 
* Vr'cY = 0, 

Jl^r^lU < |, 

where A = , \ Then the solution (L,S) to (II. 6p satisfies (L,S) = (L,S). 
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Proof Set H = L - L. The condition V {L) + S = V (L) + S implies that V {H) = S - S. 
Then S is supported on O because S is supported on Q C O. By considering the subgradient of 
the nuclear norm at L, we have 

||£||* > ll^ll* + (PtH, UV*) + \\P T ±H\\ m . 

By the definition of (L, S), we have 

||L||* + A||S , ||i < ||L||* + A||5||i. 

By the two inequalities above, we have 

A||5||i - A||5||i > {V T (H),UV*) + \\V T ±H\\*, 

which implies 

M\S\\i - X\\V /T'(S)\\i > (H,UV*) + \\V T x(H)\U + M\Vr(S)\\i. 
On the other hand, 

\\Vo/t>S\\x = \\S + Vo/v<{-H)\\i 

> \\Sh + (sgn(S),Vn(-H)) + ||7>o/(r'un)(-#)l|i 

> WSh + iVo/riW),-^. 

By the two inequalities above and the fact Vr'S = Vr'(S — S) = —Vt'H, we have 

\\V T x(H)\l + \\\V T ,(H)\\ l < (H,XV 0/ r(W)-UV*). (4.2) 

By the assumptions of Y, we have 

(H, XV /AW)-UV*) 

= (H,Y + \V (W)-UV*)-(H,Y) 

= (V t (H),Vt(Y + XP 0/v ,(W) - UV*)) + (V T ±(H),V T ±(Y + XP /v{W))) 
- (V v (H),Vr'(Y)) - (V v >c{H),Vt>o{Y)) 

< - 2 \\V T {H)\\ F + \\\V T x(H)\U + j\\T T '(H)\\i- 
n z 4 4 

By inequality 14.21 

j\\r T x(H)\U + ^\\r r >(H)h < ^\\V t (H)\\f. (4.3) 
4 4 n z 

Recall that we assume [I (i-i s ) p 'Pt'Pt''Pt — Vt\\ < \ and || ^ = VtPy'\ < \/3/2 all through the 

paper. Then 

\\V T {H)\\ F < 2|| 1 VtV t >Vt(H)\\f 
(1 - 2s)p 

< 2]| (1 ^ 2s) V T Vr>V T x(H)\\ F + 2[| ^ ^r^(g)lk 
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By inequality 14.31 we have 
,3 



A 



6 



4 n 2 V(l-2s)p' 



\V t ±(H)\\f 



6 



,3A A 



4 n 2 V(l-2s)/3 



)\\Vr'H\\ F < 0. 



Then V T ± (H) = Vr>H = 0, which implies VvVt^H) = 0. Since Vy'Vt is injective 
'PtW < h) on T, we have Vt{H) = 0. Then we have H = 0. 

Suppose we can construct V and Y" satisfying 

' \\V T Y + V T (\V n >W - UV*)\\ F < ^z, 
\\V T xY + V T ±(\Pn>W)\\ < \, 
Vr'cY = 0, 

ll^r'^l 



'Pt'Pv'Pt' 



(l-2«)p 



(4.4) 



< A 

loo _ 4 • 



and 



||7? T y + P T (A(2^, /r ,(W) - Pn'WO - ^*)|| F < 
\\V T xY + P T x(\(2P {v/r ,(W)-V n >W))\\ 



2/i- 



< 



p r , c y = o, 
WPvYWoo s 



(4.5) 



< 4. 



Theny = (Y+Y)/2 will satisfyEO By the assumptions in Model 2, (r'^n'W) and (F, 27 3 n / /r /(W)- 
Vfi'W) have the same distribution. Therefore, if we can construct Y satisfying (|4.4j) with high prob- 
ability, we can also construct Y satisfying (I4.5P with high probability. Therefore to prove Theorem 
1.3, we only need to prove that there exists Y satisfying (|4.4p with high probability: 

Proof (of Theorem ll.3p Notice that T' ~ Ber((l — 2s) p). Suppose that q satisfying 1 — (1 — 2s)p = 
(1 — ^ 1 ~g S - lp ) 2 (l — q) l ~ 2 , where I = [51ogn + lj. This implies that q > Cp/\og(n). Define 
Qi = Q2 = (1 — 2s)p/6 and q^ = ... = qi = q. Then in distribution we can let r" = Ti U ... UT;, where 
Tj ~ Ber(qj) independently. 
Construct 

'z = v T (uv*-\v a >w), 

z 3 = CPt - \-VtV V] Vt)Z 3 - X for j = 1, ...J ., 



Then by Lemma |4.H we have 



\Z 



j\\F 



< 



\z 



j-l\\F 



for j = 1, 



with high probability provided C p is large enough and C s is small enough. Then ||Zj||_F < 
(|) J ; ||Zo||f- By the construction of Zj, we know that Zj S Range(7 7 T) and Zj = (I — -Vt'Pt^Zj-i. 
Then similarly, by Lemma 14.21 we have 



\Zi\ 



< 



2V^g 



n 



j 0||ocm 
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and 



I 1 1 oo 



J 0||oo 



for j = 2,...,l 



2-? log n 

with high probability provided C p is large enough and C s is small enough. Also, by Lemma 14.31 we 
have 



1 



\\{I--Vr j )Z j - 1 \\<C> 



' n log n 



Z 



j-llloo 



for j = 



with high probability provided C p is large enough and C s is small enough. 

We first bound H^oIIf and ||^o||oo- Obviously ||^o||oo < ||^^*||oo + A||7 3 t7 : 'o/(W' a )|| 00 . Recall that for 
any i, j £ [n], we have llT^ejepHoo < ^ and WPT(eie*)\\ F < d~^-- Moreover, T>n'(W) satisfies 
(Vn'iW))- are iid random variables with the distribution 



(V Q .(W))i 



1 with probability 1 _ +2s P 
with probability ^TLp 
-1 with probability 1 _ p s ^ 2 sp 



Then by Bernstein's inequality, we have 

V(\(V T (Vn>(W)),eie*)\ >t) 



< 2exp( 



Vw(W\VT(eie*))\>t) 
t 2 /2 



where we have 



and 



2sp 



^EX] + Mt/3' 

*i|2 ^ ^„^ r 



p + 2sp J n 



2pr 



n 



Then with high probability we have \\V T Vn> (W)^ < CJp^^(> CJC p ^ 



fir log n fir log n 



> 



C^fCpMlogn). Then by ||L/V*||oo < ^ we have ||^ ||oo < C^-, which implies ||Z ||f < 
ra||Z ||oo < C^/pr . 



Now we want to prove Y satisfies 14.41 with high probability. Obviously Vr^Y = 0. It suffices 
to prove 



\\V T Y + V T {XPw{W) -UV*)\\ F < ^ 

\\Vt±Y\\ < h 
\\V T 4Wn>(W))\\ < I, 



(4.6) 
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First, 



1 

\\V T Y + V T {XPn<{W)-UV*)\\ F = WZq-C^-VtVtjZj^Wf 

3=1 qj 



Second, 



l 



\\V T Z - -V t Vy 3 V t Z 3 . x )\\ f 

3=1 Qj 

1 j ° 1 

\\(V T - —V T V ri VT)Zo - (V -Wr^T^-Ollir 
1 1 



i=2 * 



A 



I^IIf <C(|)VF< f 



! i 

\\T T xY\\ = WVTX^-PTjZj^W 

3=1 ^ 

' 1 

< ^||_7V^^-i|| 

3=1 % 

1 1 

= En^(-^^-i-^-i)ii 

3=1 qj 
1 1 

^ E n-^^-i -^--iii 

3=1 J 

^ / ralogn 

V * 

< C^/nlogn^— — i — + _L — + -L)||Zo|| 00 



< y/re/ir log n 1_ 

provided C p is sufficiently large. 

Third, we have \\XP T xVw(W)\\ < \\\Va>(W)\\. Notice that Wij is an independent Rademacher 
sequence independent of f2'. By Lemma 14.31 we have 

2sp 



-W -Vn>(W)\\ < C' ^nplogn 
1 — p + 2sp 
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with high probability provided 1 _^ 2s/g < p and p > Co^p. By Theorem 3.9 of |39| . we have 
||W||oo < C\y/n with high probability. Therefore, 

\\Vn>(W)\\ < C'v^^ + C^—^—. 

1 — p + Zsp 

By choosing p = -^r for some appropriate C2, we have ||"Pn'(W)|| < v ' rtP g log n , provided C p is large 
enough and C s is small enough. 

Fourth, 



<7j 



< 



1 1 11 1 \ 1 1 r7 II 

^ ^ 2^ 1 log n g 2 2 Vlog n qi 



np 4-^/log n ' 



provided Cp is sufficiently large. 



Notice that in [3] the authors used a very similar golfing scheme. To compare these two methods, 
we use here a non-uniform sizes golfing scheme to achieve a result with fewer log factors. Moreover, 
unlike in [3] the authors used both golfing scheme and least square method to construct two parts 
of the dual matrix, here we only use golfing scheme. Actually the method to construct the dual 
matrix in [3j cannot be applied directly to our problem when p = 0(r log 2 n/n). 
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